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ABSTRACT 

In the design phase of a system, how does a design engineer or manager choose between a subsystem 
with .990 reliability and a more costly subsystem with .995 reliability? When is the increased cost justified. 

High reliability is not necessarily an end in itself but may be desirable in order to reduce the expected cost 
due to subsystem failure. However, this may not be the wisest use of funds since the expected cost due to 
subsystem failure is not the only cost involved. The subsystem Itself may be very costly. We should not 
consider either the cost of the subsystem or the expected cost due to subsystem failure separately but 
should minimize the total of the two costs . La., the total of the cost of the subsystem plus the expected cost 
due to subsystem failure. 


ASSUMPTIONS AND NOTATION 

In this paper assume perfect switching devices (if needed) of negligible cost and independence of the 
subsystem modules. 

NOTATION 

n number of modules in the subsystem 

k minimum number of good modules for the subsystem to be good 

r reliability of the whole system for other than failure of the subsystem 

r t reliability of the subsystem 

c, loss due to failure of the subsystem 
q loss due to subsystem output at v c (for models 3, 4, and 5) 
c. cost of a one module subsystem capable of full output 

q, cost of a module in a k-out-of-n:G subsystem when k is fixed (see later discussion) 

g 4 (k) function which relates cost of subsystem to the number of modules in the subsystem 

v c fraction of subsystem output necessary so that the mission is not a failure 

p probability that a module is good 
q probability that a module fails or 1-p 

C the total of the cost of the subsystem itself plus the expected loss due to subsystem failure 
A. failure rate of a module (models 4 and 5) 

T 0 mission time 


INTRODUCTION 


Since expected value is an important ingredient in our quest for finding the best subsystem, consider the 
expected cost due to subsystem failure denoted as E{cost due to subsystem failure}. As with all expected 
values, it depends upon both the dollar cost and the probability of Its occurrence. If we let Pr mean 
'probability of, then E{cost} = cost x Pr{cost occurrence}. Let c, be the cost due to failure of the 
subsystem, including all costs incurred by subsystem failure (but not the cost of the subsystem itself). This 
number could be the entire cost of the main system (or even greater) K failure of the subsystem resulted in 
failure of the main system. In other instances c, would be less than the cost of the main system, e.g., 
failure of the subsystem resulted in only a partial failure of the main system. 

Now the expected cost due to subsystem failure is c, times the probability that this cost will be experi- 
enced. To experience a cost due to subsystem failure, Jwg events must occur, namely: 1. the main system 
must be good, and 2. the subsystem must fail. For example, if the main system (a rocket) is not good (e g. 
a launch is canceled or the rocket explodes for some reason other than for the subsystem being consid- 
ered), then a cost due to subsystem failure cannot occur. So for our expected cost we want to consider 
the Pr{main system good subsystem failure}. Let r be the reliabNity of the main system (for other than 
failure of the subsystem) and let r s be the reliability of the subsystem. We will also use the fact that Pr{A 
and B} * Pr{AflB} = Pr{A}Pr{B | A}.] Then 

E{cost due to subsystem failure} = c,Pr{main system goodfi subsystem failure} = c,Pr{main 

system good} Pr{subsystem failure | main system good} = c,r(1-r s )= rc,(1-r s ). 

We can minimize this expected cost by building a subsystem with an extremely low probability of failure 
(high reliability). However, it is not dear that we should build the most reliable subsystem possible since 
this will minimize only the expected cost due to subsystem failure but does not consider the cost of building 
the subsystem itself. We should not consider the two costs separately. We therefore minimize the total of 
the two costs, i.e., the total of the cost of the subsystem plus the expected cost due to subsystem failure . 
The total cost to be minimized is 

C ■ cost of the subsystem + E{cost due to subsystem failure} 

= cost of the subsystem + rc, (1 -r s ) (1). 

In minimizing cost C we see that we are balancing the cost of the subsystem and the expected cost due to 
subsystem failure . 


SELECTING THE BETTER SUBSYSTEM 

Suppose that we are considering two subsystems. Subsystem 1, which costs $200,000, has a .97 reliability. 
Subsystem 2, with a cost of $100,000, has a .94 reliability. Without further analysis, there is no dear 'best' 
subsystem and the choice is often based upon the amount budgeted for the subsystem. 

Assume that the two subsystems under consideration will be part of a main system which has a reliability 
(exdusive of the subsystem under consideration) of r = .96. Well further assume that failure of the 
subsystem will result in a cost of c, = $10,000,000. Let us first compare the E{cost due to subsystem fail- 
ure} for each of the two subsystems. 

For subsystem 1 , 

E{cost due to subsystem failure} = rc, Pr{ subsystem failure} 

= rc, (1 -r si ) 

= .96x$1 0,000,000x03 = $288,000. 
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For subsystem 2, 

E{cost due to subsystem failure} = rc, (1-r S2 ) 

= ,96x$1 0.000.000X.06 = $576,000. 

Subsystem 2 has a higher expected cost than subsystem 1. However, since 2 is also less expensive, we 
need to compare the overall expected cost, C, for 1 and for 2. 

For subsystem 1, 

G-, = $200,000 + $280,000 = $488,000. 

For subsystem 2, 

Q 2 = $100,000 + $576,000 = $676,000. 

Since G*, < G> 2 , we select subsystem 1 over subsystem 2. 

For further information on expected values or on selecting the best subsystem, see [3]. 


K-OUT-OF-N :G SUBSYSTEMS 

In this article we'll direct our attention to a specific type of subsystem, called a k-out-of-n:G subsystem. 
Such a subsystem has n modules, of which k are required to be good for the subsystem to be good. As 
an example consider the situation where the engineer has a certain power requirement. He may meet this 
requirement by having one large power module, two smaller modules, etc. The number of modules 
required is called k. For example, the engineer may decide that k = 4. Then each module is 1/4 of the full 
required power. Therefore, the subsystem must have 4 or more modules for the full required power. The 
number of modules used in the subsystem is called n. For example, an n = 6 and k = 4 subsystem would 
have 6 modules each of 1 /4 power and thus would have the output capability of 1 .5 times the required 
power. The engineer chooses n and k. Selection of the different values of n and k results in different 
subsystems, each with different costs and reliabilities. Since each n and k yields different subsystems with 
different costs, we can choose the subsystem (the n and k) which will minimize cost C. 


MODEL 1 

The simplest k-out-of-n: G model is one where the modules are independent and all have common 
probability of being good p and common probability of failure q = 1-p. Let X count the number of good 
modules. Now 

E{cost due to subsystem failure} = rc, Pr{subsystem failure} 

= r c, Pr{X< k}=rc£( n )p*q n - x . 1 

x- 0\X) 


Recall that C = cost of subsystem + E{cost due to subsystem failure}. We therefore need to also 
consider the cost of the subsystem. First consider a simple situation where k is fixed. Here we are free to 
choose n. Then n-k will be the redundancy or number of spares in the subsystem. If each module costs 
q,,then the cost of subsystem = nc 4 . Using this with (2) we obtain 
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C = cost of subsystem + E{cost due to subsystem failure} 

P'l". 


We wish to find the n which minimizes cost C. 

The authors have written a BASIC program (QuickBASIC 4.5) to find the n which minimizes C. 
Additionally this program will, if you desire, graph C as a function of either p or c, . The program will plot 
the best subsystems (i.e. the ones with the lowest Cs) over ranges of p or c, . This allows you to not 
only select the best subsystem for a particular value of p or c, but also to view what happens to C for 
nearby values of p or c, . 

As an example, consider the situation when k = 1, where only one module is required to be operational 
for the subsystem to be operational. The reliability of this slnale module is estimated to be .95 (p = .95). 
Let the reliability of the system for other than failure of the subsystem be .9, (r = .9). The cost of one 
module is 1 (q, = 1) million dollars (throughout the remainder of the paper all costs will be in millions of 
dollars). The cost due to failure of this subsystem is 10 (c, = 10). 

Figure 1 shows a plot of C for p ranging from .79 to 
.99 and n's of 1 through 4. When the reliability of a 
single module p = .95. n = 1 has the lowest value 
of C. Therefore the best subsystem in this case is 
one with no spares. We see from figure 1 that the 
n = 1 subsystem (no spares) has the lowest value 
of C for any p > .87. If p < .87, then n = 2 (one 
spare) has the lowest value of C. For p < .79, we 
would view the graph over the range of p < .79. 



Figure 1 
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Figure 2 


Now suppose instead that c, (cost due to failure of 
the subsystem) is 50. Figure 2 shows the plot of C 
for c, = 50. We first note that if p = .95, then the n 
= 2 subsystem is the best. Comparing figures 1 
and 2 (at p = .95) we see that the larger value of c, 
(in figure 2) requires a larger value of n. This 
principle holds in general. If the cost of subsystem 
failure increases then more redundancy is required. 
If .83 < p < .98, figure 2 shows that the n = 2 
subsystem is best. If p is below .83 then more 
redundancy (n=3) is required. If p > .98, then no 
redundancy (n=1) is required. 


4 




MODEL 2 


If in model 1 , we are also free to choose k in our subsystem, then we have model 2. Let c, be the cost 
of a subsystem consisting of exactly one module. Further suppose that the cost of a subsystem with 
exactly k modules is c, g(k). Here g(k) is the factor which measures the (generally) Increased cost erf 
building a subsystem consisting of k smaller modules rather than one large module. If g(k) = 1 for all k 
then a subsystem of k modules costs the same as a subsystem consisting of a single module. Any g(k) 
may be used. For example, If a subsystem of 2 smaller modules costs 4 times as much as a single 
module subsystem then g(2) = 4. Therefore this subsystem would cost c, g(k) = c, g(2) - 4q. If a 
subsystem of 3 smaller modules costs 7 times as much as a single module subsystem then g(3) = 7. 
Other values for g(k) may be defined In a similar manner. Therefore. In the above example, g(l) = 1. 
g(2) = 4, g(3) = 7, etc. We also assume that each module in the subsystem costs c,g(k)/k, which is 
1 /k of the total cost for k modules. Since we have a total of n modules In the subsystem, then the cost 
of the subsystem = nc,g(k)/k. Using this with (2) we obtain 

C = cost of subsystem + E{loss due to subsystem failure} 


PV*. 


For any particular situation with given values of c, , c,, r, p and g(k) we use the BASIC program to select 
the n and k to minimize C as given above. There are two options for g(k) built into the BASIC program. 
You may choose either g(k) = (1 +b)g(k-1) or g(k) = k(1/k) c . where you are free to set b or c. 

If you believe that the cost of building a subsystem of k modules Increases (or decreases) linearly with k, 
then you would choose the first option g(k) = (1 +b)g(k-1), with b > 0 (b < 0). For example, if building 
a subsystem of two smaller modules costs 20% more than building a single module subsystem, 3 
modules costs 20% more than a subsystem of two modules, etc., then let b = .2. If you believe that the 
cost of building a subsystem is exponentially proportional to the number of modules in the subsystem 
then you would choose the second option g(k) = k(1/k) c . For example, consider building a space 
electrical power subsystem. A rough rule of thumb says that the cost of smaller modules for a space 
electrical power subsystem is proportional to the electrical power raised to the .7, i.e., g(k) = k(1/k)^ . 
Therefore, a subsystem consisting of a single module capable of full power costs (^g(l) = ^1(1/1) = 
1.0c,. A subsystem consisting of 2 modules, each of 1/2 power, costs c,g(2) = c, 2(1/2) = 1.23c^ to 
build, etc. An n = 3 and k = 2 subsystem, (one having 3 modules each of 1/2 power) costs nc, g(k)/k 
= 3cj (1 /2) 7 /2 = 3C, XI .23/2 = 1.85c, to build. 

As an example of model 2, suppose we are building a space electrical power subsystem. The cost due 
to subsystem failure, c, , is 240. Let the reliability of the system for other than failure of the subsystem 
be .9 (r = .9). Suppose that the cost of building a single module capable of full power Is 1 (c, = 1). 
Using the rule of thumb stated above, we use the option for g(k) with c = .7. All of the above values 
are entered into the BASIC program as parameters. An estimate of p, the reliability of an individual 
module, is .96. If we are unsure of this estimate, we can use the BASIC program to view (figure 3) the 
best subsystems over p ranging from .89 to .99. 
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From figure 3, at p = .96, the n = 2, k = 1 
subsystem is best (lowest value of C). If p < .95, 
the n - 4, k = 2 subsystem is best. Note this is a 
flatter curve over the range of p, indicating a low 
value for C over a wide range of p. 



Figure 3 


O-l 



1 H3CWTP CTST 

Figure 4 


For the same example, suppose we wish to view 
what happens to C as c, varies. Figure 4 (from the 
BASIC program) shows, if c, is below 310, then the 
n = 2, k = 1 subsystem is best. However, for 310 
< c, < 400, the n = 5, k = 3 subsystem is the 
best. For c, > 400 the n = 4, k = 2 subsystem is 
the best. This type of analysis could be used 
whenever you are unsure of c, and wish to consider 
results over a range of values. 
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MODEL 3 


Figure 5 shows the loss due to subsystem failure, 
where v is the ratio of the actual output of the 
subsystem to the specification output. If v drops 
below some critical value v c , the mission is a 
complete failure and the loss is c, . However, If v 
is at v c , then the loss is only q, . As v increases 
above v c , this loss decreases until there is no 
loss at full output. 

Although h Is linear in figure 5 other loss 
functions, e.g., a decreasing multi-step function, 
are appropriate. If h(v) = a - av, v c < v < 1, a = 
Cj/O-Vc). (1) becomes 


, *-1 / v 

Cvne&M+r c, £ H P x <T x *r £ " 
x-0 \X) xxkv e \*/ 

The third term on the rhs is expected loss due to partial failure of 
means of the BASIC program, the n and k which minimize C. 

MODEL 4 

Suppose in model 3 (with q = q) that mission time is also important. If modules fail exponentially with 
failure rate A., then the probability of a module still operating successfully at time t is exp(-At). Let f(x,t) 
be the joint probability density function of x successes and time t. We will use the fact thaH(x,t) = g(x) 
f(t | x). Now f(t | x)is the time at which the x** 1 success occurs (the waiting time for the (n-x) ,h failure), 
given that n-x failures have occurred before mission time T 0 . Then 

r 0 

f(t | $=L(t | ifUt | Ji^dt O<t<T 0 

0 

where Ut \ *)= ” 1 4 -[exp(-A/)] x Xexp(-A0[1 -expt-XOj^ 1 . 

xl(n-x-1)l 

f(x,t)=f(t | x) p(x) 

where g(x)= | " j [exp(-XTJ] x [1 -exp(-AT 0 )r Jt x-0,1 n. 


p*q”~* {a-axf A). 

the subsystem. Again we can find, by 


Loss 



(Device 


Loss Function for Model 3 


Figure 5 


Note: g(x) is the probability of exactly x successes in n modules at mission time T 0 . 
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L(t | x) is a probability density function (pdf) which, when integrated from t, to ^ , yields the probability 
that the (n-x) ,h failed module (the last modulue which can fall with x remaining good modules) will occur 
at a time between t, and t,. We can write L(t| x)= R*n*S where 

*= ii-expt-ior*' 1 

5-Aexp{-A./). 


R is a pdf, which when integrated from 0 to t, yields the probability that, with n-1 modules, (n-x-1) failed 
modules and x good modules will occur before time t. S is a pdf, which when integrated from t, to 
yields the probability that a module will fail between time % and Since any of the n modules can fail 

To 

at time t, we multiply R by n *S to obtain L(t | x). Now gives the probability that the (n-x) ,h 

o 

failed module occurs in (0, T 0 ). Since we wish to define f(t | x) as a probability density function on 0 < t 
T 0 To 

< T 0 , we must have J f{t\$dt = 1 , and so we divide L(t | x) by j L(t\)t)dt to obtain f(t | x). 

o o 

If the output fraction is v c at the start of the mission, our loss is q. As v increases above v c , then this 
loss decreases until there is no loss at full output. With output at or above v c , losses decrease with in- 
creasing time until there is no loss beyond mission time T 0 . Additionally, for any given t, h(v,t) decreases 
as v increases above v c . 



Figure 6 
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Now (1) becomes 


n '• 

C^nCaSKtyk+rE fh(xtk,t) ( 3 ) 

X^> o 

If we let 

hW>dm EV' 

7-o 

As "e hH n ~*~' \hx*M)^[9XP(-XT 0 (x*M))-A\ 

bo \ i r 

B* exp(-X T 0 (x+l+mT 0 (X(x+M))-' +(A(x+M))-*)-(l(x+/+1))- 2 
then, after integrating, (3) becomes 

m f *-1 rt-X-1 / n y 1 \ 

C-nc&W+r E { E M<MK> E (~1) M , 

y*o x»o be x • / 

[exp(-x T 0 (x+/+1))E (/ I^KMIWx+M))- 1 !- 1 
#«0 

-/ l[A(x+/+1)]'^ 1 ] } where 4x)«[g(x)]‘M 


( 4 ) 


(5) 


We wish to find the n and k which minimize C. Minimizing C in (5) is appropriate for any loss function, h 
( ), of the form given in (4). Using the loss function given in figure 6. for 0< x < kv c , d(x/k) = 1, m_« 1, 
= Cj and b, = - c,T 0 '\ For kv c < x< k-1 we have d(x/k) = 1 - x/k, m = 1, = a and b, = -aT 0 

where a = ^(l-vj* 1 with 0 < v c < 1. Using (5) we obtain 

C=na i g(k)lk 

jk*v, x*kv c rt-x-1 /_ „ i \ 

£ mh , B 

X-0 X-0 bO \ 1 / 


k - 1 


+aE 4*)(1-*/A) A 

Xikv e 


-aV E JW-We (-irf""/" 1 ) 

*v« r-o \ 1 J 
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MODEL 4 APPLICATIONS 


Model 4 might reasonably be applied to non- 
recoverable systems which, at the end of their 
service life, have no intrinsic or salvage value or 
which are prohibitively expensive to recover. 

Examples include undersea sonar systems 
anchored In deep water, instrument/teiemetry 
packages located in remote regions or 
communications satellites in geosynchronous orbit. 

For a geosynchronous communications satellite a 
number of subsystems could be chosen as an 
example. Let us examine the satellite power system 
which can be divided into smaller identical modules. 

We again use the rule of thumb which says that the 
cost of a space power subsystem is proportional to 
the electrical power raised to the .7 (g(k) = 
k(l /kf). Suppose that the mission life is 7 years and the reliability of the satellite (exclusive of the 
power subsystem) over the mission life is .90. Because the satellite needs power for stationkeeping, 
computers and cooling, at least 10% of the specification power is needed for the satellite to survive. 
Therefore, v G is 0.1. The satellite generates $2 million per month revenue. In the event of satellite failure, 
a new satellite could be launched within two years at a cost of $115 million. Therefore c, (or c,) = 163 
(115 plus 48 in lost revenue). Here we will assume that revenue is roughly proportional to power, i.e., if 
a module of the power subsystem fails, then one or more channels are no longer available. We estimate 
failure rate, X , as 3.5(1 O' 6 ) failures per hour (hr" 1 ) and again use the BASIC program to view C over a 
range of X from 1 (1 0 -6 ) to 6(10' 6 ) hr 1 . Figure 7 shows the 5 best subsystems. For X < 4(1(T*) hr 1 , the 

n = 2, k = 1 subsystem is optimal. ForX > 4(1 O' 6 ) hr" 1 , the n = 3, k = 1 subsystem is optimal. 

* 

MODEL 5 



Figure 7 


Suppose we have a situation similar to model 4 but now 
assume a loss of c, if the output fraction from the subsystem 
is below v c anytime during the life of the mission. 

Model 5 could be applied to recoverable systems, systems 
which have inherent salvage value or manned systems. 
Examples include manned aircraft or spacecraft, recoverable 
undersea vehicles or spacecraft. Model 5 implies that if the 
output fraction of the subsystem falls below the critical value 
v e , something catastrophic will occur, such as loss of the 
whole system or loss of life. With these systems, loss or 
significant degradation of a critical subsystem might cause 
loss of the craft and occupants. An example of such a loss 
function is given by figure 8. 

With this loss function, for x < kv c , = Cj and b, = 0 and 

for kv e < x< k-1, we have d(x/k) = 1 - x/k, m = 1, l*, = a 
and b, = -aV 1 where a = c, (1 - vj" 1 with 0 < v c < 1. 
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Using (5) 


rw # , v 
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Use of the BASIC program is applicable to view C over a range of either i ore,. 


BASIC PROGRAMS 

The authors will be sending copies of the BASIC program to selected organizations in the United States 
for Initial testing. It is anticipated that the Basic program will become available in the future through 
NASA's Computer Software Management and Information Center (COSMIC). 
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SUMMARY 


Table 1 contains a summary of the five models which can be applied in a redundancy cost analysis. 


Table 1 

Redundancy Cost Models Considered in this Paper 

Model 1 Simplest cost model. The subsystem consists of n modules, of which k are required for 

success of the mission. If less than k modules are good, a loss of c, occurs. In model 
1 , k is fixed. 

Model 2 Same as model 1 except k may also vary. The g(k) cost function is also 

available to be used where increased redundancy brings in more (non- 
linear) cost. 

Model 3 Model 3 expands on models 1 and 2. Linear (or other) loss functions are utilized. If 

less than k modules are good, some loss wHI occur but not necessarily the entire loss of 
c, . The loss which occurs depends upon some critical output fraction v c . 

Model 4 Model 4 considers time in the loss function. Modules in the subsystem fail exponentially 

with rate A. . 

Model 5 Model 5 handles situations where output fraction below v c causes a loss which is not 

time dependent, e.g., manned space missions where loss of a major portion of a critical 
subsystem may cause loss of life. 
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